当前位置:
首页 资源下载
搜索资源 - document similarity
搜索资源列表
-
0下载:
In this paper, we propose a method of text retrieval from document images using a similarity
measure based on an N-Gram algorithm.
-
-
0下载:
Information Retrieval (IR) is the discipline that deals with retrieval of unstructured
data, especially textual documents, in response to a query or topic statement, which
mayitselfbeunstructured,e.g.,asentenceorevenanotherdocument,orwhichmay
be s
-
-
0下载:
执行流程:
1. 用户输入参数:K的选择,训练数据,测试数据的路径;
2. 读取训练数据集和测试数据集文件,用ArffFileReader类读取并组织起InstanceSet数据结构;
3. 利用上面的相似度量标准,对每一个测试集中的Instance,计算与其最相似的K个训练集中的Instance,通过投票进行分类,将分类结果存储经Instance的成员变量targetGuess中;
4. 对分类结果进行度量,包括分类正确率,各种类别实例的Precision,Recall;Con
-
-
0下载:
本科毕业设计
用于判断电子文档相似程度
VC6.0环境实现-Graduate electronic document designed to determine the degree of similarity
-
-
0下载:
基于关键词的Web文档自动分类算法研究,文档关键词,语义相似度,聚类算法,知网,拓扑网络图,中文分词-Keyword-based Web Document Classification Algorithm, document keywords, semantic similarity, clustering algorithm, HowNet, topological network diagrams, Chinese word segmentation
-
-
2下载:
文本距离,文本相似度计算的java源代码,内含测试文档-Text from the text similarity calculation java source code, containing the test document
-
-
0下载:
使用vsm模型对文档的相似度进行分析,以前一百篇文档为查询条件-Vsm model used to analyze the similarity of the document
-
-
0下载:
通过图模型表示本体中概念以及概念之间的语
义关系,用来将一个概念和一个文档扩展为一个语义模糊集,并计算模糊集合之间的相似度。-Between the concept and the concept of language in the body through the graph model
Justice relationship, to a conceptual and a document extension is a semantic fuzzy set, and calcula
-
-
0下载:
text similarity matching using LCS algorithm for detect plagiarize in text document
-
-
0下载:
xml计算相似度的一种方法,智能处理相似度的具体方法及程序-A Recursive Methed to Computer Similarity of XML Document
-
-
0下载:
听过扫描文档判断两个词之间相似性的代码,分别使用了文档窗口,距离为1的滑动窗口和距离为2的滑动窗口。-Listened to scan documents judge similarity between two words of the code, respectively, using the document window, a distance of a sliding window and the distance of two sliding window.
-
-
0下载:
the package contains software for document clustering based on cosine similarity measure
-
-
0下载:
自动创建目录生成50W篇英文字数在100~120的英文文档,随机生成其中jacord相似度在0.9以上的有 10-Automatically creates a directory generate 50W English words in the English papers document 100 to 120 randomly generated which jacord similarity above 0.9 are 10
-
-
0下载:
求字符串s1和s2的最大公共字串,衡量文档的相似度,体现了词的顺序。-The biggest public string for strings s1 and s2, measure the similarity of document, embodies the word order.
-
-
1下载:
算法思想:提取文档的TF/IDF权重,然后用余弦定理计算两个多维向量的距离来计算两篇文档的相似度,用标准的k-means算法就可以实现文本聚类。源码为java实现(Algorithm idea: extract the TF/IDF weight of the document, then calculate the distance between two multidimensional vectors by cosine theorem, calculate the similarity
-
-
0下载:
是做文本预处理时候利用爬虫收集的500个中文文档,包括分词部分、去掉特殊符号部分以及最后的相似度计算等(It is the 500 Chinese document collected by a crawler for text preprocessing, including the part of the participle, the removal of the special part of the symbol, and the final similarity calculatio
-
-
1下载:
该文件包含一篇基于秩极小化的压缩感知图像重建及其代码实现,该方法利用图像自身非局部相似性,构建低秩矩阵模型,实现图像重建。(This document contains a compressed sensing image reconstruction based on rank minimization and its code implementation. This method uses the non-local similarity of the image itself to co
-